Search CORE

43 research outputs found

Data Vaults: Database Technology for Scientific File Repositories

Author: Ivanova M.G. (Milena)
Kargin Y. (Yagiz)
Kersten M.L. (Martin)
Manegold S. (Stefan)
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2013
Field of study

Current data-management systems and analysis tools fail to meet scientists’ data-intensive needs. A "data vault" approach lets researchers effectively and efficiently explore and analyze information

CWI's Institutional Repository

International Migration, Integration and Social Cohesion online publications

Lazy ETL in Action: ETL Technology Dates Scientific Data

Author: Ivanova M.G. (Milena)
Kargin Y. (Yagiz)
Kersten M.L. (Martin)
Manegold S. (Stefan)
Zhang Y. (Ying)
Publication venue
Publication date: 01/08/2013
Field of study

Both scientific data and business data have analytical needs. Analysis takes place after a scientific data warehouse is eagerly filled with all data from external data sources (repositories). This is similar to the initial loading stage of Extract, Transform, and Load (ETL) processes that drive business intelligence. ETL can also help scientific data analysis. However, the initial loading is a time and resource consuming operation. It might not be entirely necessary, e.g. if the user is interested in only a subset of the data. We propose to demonstrate Lazy ETL, a technique to lower costs for initial loading. With it, ETL is integrated into the query processing of the scientific data warehouse. For a query, only the required data items are extracted, transformed, and loaded transparently on-the-fly. The demo is built around concrete implementations of Lazy ETL for seismic data analysis. The seismic data warehouse is ready for query processing, without waiting for long initial loading. The audience fires analytical queries to observe the internal mechanisms and modifications that realize each of the steps; lazy extraction, transformation, and loading

CWI's Institutional Repository

Instant-on scientific data warehouses: Lazy ETL for data-intensive research

Author: Ivanova M.G. (Milena)
Kargin Y. (Yagiz)
Kersten M.L. (Martin)
Manegold S. (Stefan)
Pirk H. (Holger)
Publication venue
Publication date: 01/08/2012
Field of study

In the dawning era of data intensive research, scientific discovery deploys data analysis techniques similar to those that drive business intelligence. Similar to classical Extract, Transform and Load (ETL) processes, data is loaded entirely from external data sources (repositories) into a scientific data warehouse before it can be analyzed. This process is both, time and resource intensive and may not be entirely necessary if only a subset of the data is of interest to a particular user. To overcome this problem, we propose a novel technique to lower the costs for data loading: Lazy ETL. Data is extracted and loaded transparently on-the-fly only for the required data items. Extensive experiments demonstrate the significant reduction of the time from source data availability to query answer compared to state-of-the-art solutions. In addition to reducing the costs for bootstrapping a scientific data warehouse, our approach also reduces the costs for loading new incoming data

CWI's Institutional Repository

Data Vaults: a Database Welcome to Scientiﬁc File Repositories

Author: Datcu M. (Mihai)
Espinoza Molina D.
Ivanova M.G. (Milena)
Kargin Y. (Yagiz)
Kersten M.L. (Martin)
Manegold S. (Stefan)
Zhang Y. (Ying)
Publication venue
Publication date: 01/01/2013
Field of study

Efficient management and exploration of high-volume scientific file repositories have become pivotal for advancement in science. We propose to demonstrate the Data Vault, an extension of the database system architecture that transparently opens scientific file repositories for efficient in-database processing and exploration. The Data Vault facilitates science data analysis using high-level declarative languages, such as the traditional SQL and the novel array-oriented SciQL. Data of interest are loaded from the attached repository in a just-in-time manner without need for up-front data ingestion. The demo is built around concrete implementations of the Data Vault for two scientific use cases: seismic time series and Earth observation images. The seismic Data Vault uses the queries submitted by the audience to illustrate the internals of Data Vault functioning by revealing the mechanisms of dynamic query plan generation and on-demand external data ingestion. The image Data Vault shows an application view from the perspective of data mining researchers

CWI's Institutional Repository

International Migration, Integration and Social Cohesion online publications

Instant-on scientific data warehouses lazy ETL for data-intensive research

Author: Ivanova M.G. (Milena)
Kargin Y. (Yagiz)
Kersten M.L. (Martin)
Manegold S. (Stefan)
Pirk H. (Holger)
Publication venue
Publication date: 01/01/2013
Field of study

In the dawn of the data intensive research era, scientific discovery deploys data analysis techniques similar to those that drive business intelligence. Similar to classical Extract, Transform and Load (ETL) processes, data is loaded entirely from external data sources (repositories) into a scientific data warehouse before it can be analyzed. This process is both, time and resource intensive and may not be entirely necessary if only a subset of the data is of interest to a particular user. To overcome this problem, we propose a novel technique to lower the costs for data loading: Lazy ETL. Data is extracted and loaded transparently on-the-fly only for the required data items. Extensive experiments demonstrate the significant reduction of the time from source data availability to query answer compared to state-of-the-art solutions. In addition to reducing the costs for bootstrapping a scientific data warehouse, our approach also reduces the costs for loading new incoming data

CWI's Institutional Repository